Generalizability of the WEBSOM Method to Document Collections of Various Types

نویسنده

  • Krista Lagus
چکیده

WEBSOM is a method in which the self-organizing map algorithm is used to automatically organize collections of documents on a map to enable easy exploration of the collection. This article illustrates with case studies how collections of various types of text can be successfully organized using the WEBSOM. The emphasis is on describing the particular challenges that each type of material poses, as well as on identifying properties of a text collection that aaect the choices made at each progessing stage. Properties such as the size of the document collection, the size of the vocabulary, the domain, the style of writing, and the language are considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Mining with the WEBSOM

The emerging eld of text mining applies methods from data mining and exploratory data analysis to analyzing text collections and to conveying information to the user in an intuitive manner. Visual, map-like displays provide a powerful and fast medium for portraying information about large collections of text. Relationships between text items and collections, such as similarity, clusters, gaps a...

متن کامل

Statistical Aspects of the WEBSOM System in Organizing Document Collections

WEBSOM is a novel method for organizing document collections onto map displays to enhance the interactive browsing and retrieval of the documents. The map is organized automatically according to the contents of the full-text documents by the Self-Organizing Map algorithm. The map display provides a visual overview of the whole document collection. The overview, the map display , aids in the exp...

متن کامل

Exploration of Full-text Databases with Self-organizing Maps

Availability of large full-text document collections in electronic form has created a need for intelligent information retrieval techniques. Especially the expanding World Wide Web presupposes methods for systematic exploration of miscellaneous document collections. In this paper we introduce a new method, the WEBSOM, for this task. Self-Organizing Maps (SOMs) are used to represent documents on...

متن کامل

WEBSOM - Self-organizing maps of document collections

Searching for relevant text documents has traditionally been based on keywords and Boolean expressions of them. Often the search results show high recall and low precision, or vice versa. Considerable eeorts have been made to develop alternative methods, but their practical applicability has been low. Powerful methods are needed for the exploration of miscellaneous document collections. The WEB...

متن کامل

Mining massive document collections by the WEBSOM method

A viable alternative to the traditional text-mining methods is the WEBSOM, a software system based on the Self-Organizing Map (SOM) principle. Prior to the searching or browsing operations, this method orders a collection of textual items, say, documents according to their contents, and maps them onto a regular twodimensional array of map units. Documents that are similar on the basis of their ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998